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EXECUTIVE  SUMMARY 


The  Federal  Aviation  Administration  Airway  Facilities  (AF)  environment  is  designing  and 
developing  a  new  operations  concept,  which  will  result  in  a  new  way  of  conducting  their 
business.  Their  focus  is  on  improving  customer  satisfaction  in  managing  the  National  Airspace 
System  (NAS)  infrastructure  services. 

The  new  AF  concept  consolidates  management  and  maintenance  functions  into  fewer,  more 
centralized  facilities,  combined  with  an  increase  in  remotely  monitored,  unmanned  facilities. 
Three  centrally  located,  regional  Operations  Control  Centers  will  be  responsible  for  monitoring 
and  controlling  the  facilities  in  their  region,  assigning  personnel  and  resources,  and  coordinating 
AF  and  Air  Traffic  information. 

Engineering  research  psychologists  from  the  NAS  Human  Factors  Branch  (ACT-530)  conducted 
this  study  to  identify  potential  causal  factors  of  human  errors,  classify  errors  by  type,  and 
investigate  strategies  to  mitigate  the  occurrence  of  errors.  They  researched  documents  on  current 
AF  operations,  analyzed  reported  human  errors,  interviewing  presently  assigned  NAS  Operations 
Managers  and  NAS  Specialists,  and  consulted  with  subject  matter  experts.  This  report 
summarizes  and  documents  the  results  of  the  research. 

The  study  identified  three  major  sources  of  error.  These  included:  communication  and 
coordination,  the  introduction  of  new  software  or  equipment,  and  procedural  errors.  Although 
fatigue  related  to  shift  work  was  not  directly  traceable  as  a  causal  factor  in  the  database  analysis, 
structured  interviews  with  AF  specialists  indicated  that  fatigue  related  to  shift  work  might  indeed 
be  related  to  some  of  the  errors  that  occur.  To  mitigate  the  errors  identified  in  this  study, 
additional  research  on  ways  to  optimize  communication  and  coordination  is  needed. 

Additionally,  special  attention  should  be  paid  to  the  prevention  of  errors  any  time  new  systems, 
equipment,  or  procedures  are  introduced.  Furthermore,  it  may  be  worthwhile  to  conduct  a  study 
examining  the  contribution  of  fatigue  related  to  shift  work  on  human  error  in  AF. 

Researchers  also  identified  procedural  errors  as  an  area  of  potential  concern  for  the  future  AF 
environment.  They  recommend  that  designers  pay  special  attention  to  developing  clear  and 
effective  procedures  and  ensuring  that  AF  specialists  are  adequately  trained  on  the  procedures. 
Human  factors  research  may  benefit  this  area  by  identifying  procedures  that  may  benefit  from 
checklists  or  other  user  aids. 

Finally,  the  researchers  also  identified  the  need  for  better  tracking  of  human  errors,  not  for  the 
purposes  of  assigning  blame,  but  rather  to  identify  possible  design  or  procedural  shortcomings. 
Tracking  the  causal  factors  leading  to  human  error  will  enhance  error  mitigation  efforts. 
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1.  Introduction 


The  Federal  Aviation  Administration  (FAA)  faces  many  challenges  in  its  plans  to  modernize  the 
National  Airspace  System  (NAS)  Airways  Facilities  (AF)  environment.  Modernization  of  the 
existing  NAS  AF  environment  necessitates  changes  in  technology,  workload,  and  procedures. 
These  changes  have  the  potential  to  introduce  new  types  of  errors  and  compounding  existing 
sources  of  error. 

1.1  Background 

In  the  current  AF  environment,  there  exists  one  National  Maintenance  Control  Center  (NMCC) 
located  in  Herndon,  VA  and  approximately  40  Maintenance  Control  Centers  (MCCs) 
strategically  placed  throughout  the  United  States.  The  NMCC  provides  national  coordination  of 
facility  restoration  and  monitoring  of  critical  situations  that  have  a  national  impact.  The  MCCs 
schedule,  coordinate,  and  track  personnel  and  equipment  resources  and  perform  certification, 
maintenance,  and  restoration  of  system/services  and  equipment. 

To  meet  the  needs  of  NAS  modernization,  AF  is  reorganizing  itself  and  changing  how  it  does 
business.  AF  plans  to  consolidate  its  management  and  maintenance  functions  into  fewer,  more 
centralized  facilities  combined  with  an  increase  in  remotely  monitored,  unmanned  facilities. 
Centrally  located  Operations  Control  Centers  (OCCs)  will  be  responsible  for  monitoring  and 
controlling  these  facilities,  assigning  personnel  and  resources,  and  coordinating  AF  and  Air 
Traffic  (AT)  information.  The  new  AF  environment  will  include  one  National  OCC  (NOCC), 
three  regional  OCCs,  32  Service  Operations  Centers,  and  numerous  Work  Centers  located 
throughout  the  United  States. 

In  their  modernization  efforts,  AF  has  outlined  seven  major  measures  of  success:  reduced 
equipment-caused  delays;  increased  customer  satisfaction;  reduced  number  of  outages;  reduced 
duration  of  outages;  increased  number  of  favorable  inspection  reports;  reduced  time  and  cost  to 
implement  new  technologies;  and  improved  employee  job  satisfaction  (FAA,  1997).  The  1999 
National  Aviation  Research  Plan  reinforces  these  measures  in  their  Air  Traffic  Services 
performance  plan,  which  includes  goals  to  increase  system  safety,  decrease  system  delays, 
increase  system  flexibility,  increase  system  predictability,  increase  user  access,  increase 
availability  of  critical  systems,  and  increase  productivity. 

Improvements  in  technology  are  already  moving  in  a  positive  direction  toward  accomplishing  the 
AF  goals.  However,  these  goals  cannot  be  achieved  without  focusing  on  the  human  side  of  the 
equation.  According  to  the  NAS  Architecture  version  4.0,  “Advances  in  technology  have 
increased  the  reliability  of  most  NAS  components;  however,  the  number  of  accidents  and 
incidents  attributed  to  human  error  has  remained  constant”  (FAA,  1999,  p.  10-9).  Increasing 
workload  changes  in  organizational  structure,  and  working  with  new  and  different  systems  all 
can  contribute  to  an  increase  in  human  error. 

1.1.1  Why  look  at  human  error? 


Researching  human  error  makes  economic  sense.  Human  error  can  cause  direct  costs  in  physical 
damage  to  equipment  and  indirect  costs  incurred  by  increased  numbers  of  outages  and  increased 
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outage  duration.  Although  one  can  cite  anecdotal  evidence  and  case  studies,  it  is  difficult  to 
estimate  the  total  financial  impact  that  human  error  has  on  the  system.  One  example  of  the 
impact  that  AF  human  error  can  cause  to  the  FAA,  the  airlines,  and  the  flying  public  is  that  a 
single  incident  of  human  error  in  1998  caused  a  2-hour  outage  and  reportedly  was  responsible  for 
265  delays  (Source:  AF  TechNet,  AF  Delays  Report  FY  1999). 

1.1.2  What  is  an  error? 


Sanders  and  McCormick  (1987)  defined  human  error  as  an  inappropriate  or  undesirable  human 
decision  or  behavior  that  reduces  or  has  the  potential  for  reducing  effectiveness,  safety,  or  system 
performance.  Two  things  should  be  noted  about  this  definition.  First,  an  error  is  defined  in 
terms  of  its  undesirable  effect  or  potential  effect  on  human  performance  and  systems  operations. 
Second,  an  action  does  not  have  to  result  in  degraded  system  performance  or  an  undesirable 
effect  on  people  to  be  considered  an  error.  It  is  enough  that  the  decision  or  action  has  the 
potential  for  adversely  affecting  the  system  operations  or  human  performance  for  it  to  be 
considered  an  error. 

Human  error  can  take  many  forms.  Often  human  error  in  the  NAS  is  defined  in  the  context  of 
safety  (accidents,  incidents),  events  that  clearly  violate  set  standards  (e.g.,  operational  errors  in 
the  Air  Traffic  Control  realm),  or  personnel-induced  outages.  These  can  be  thought  of  as  the 
more  severe  results  of  human  error  but  also  the  ones  that  are  the  most  evident.  The  most 
frequent  types  of  human  error  do  not  result  in  compromised  safety,  operational  errors,  or 
outages.  Instead,  most  errors  are  caught  before  they  cause  any  problem.  Anecdotal  evidence 
from  specialists  underscores  that  for  every  one  outage  that  occurs,  there  are  multiple  “saves.”  (A 
save  refers  to  an  incident  or  event  that  could  have  resulted  in  an  outage  but,  due  to  the  efforts  of 
a  specialist,  the  outage  was  averted.) 

In  the  analysis  of  human  error  in  industrial  plants,  80  to  85%  of  errors  that  occur  are  attributable 
not  to  human  characteristics,  but  to  error-likely  conditions  (Steinbrink,  1997).  In  these 
situations,  people  are  “set  up”  for  error  by  the  system  design.  These  error-inducing  situations 
include  deficient  procedures,  poor  communication,  inadequate  training,  misleading  information, 
and  poor  equipment  design.  Many  of  these  errors  are  entirely  preventable.  Identifying  error- 
likely  situations  is  a  first  step  toward  minimizing  or  eliminating  errors. 

1.1.3  How  can  we  look  at  errors  that  currently  exist  in  Airway  Facilities? 

In  addressing  human  error  reduction,  Wiener  (1988)  states  that  the  first  step  in  error  reduction  is 
to  identify  the  errors.  Identifying  human  errors  is  not  always  a  straightforward  task.  Errors  can 
be  obvious  but  are  more  often  subtle.  There  are  six  methods  of  identifying  potential  errors  that 
have  been  used  successfully  in  other  areas.  They  include  brainstorming  using  representatives  of 
the  area  of  interest;  critical  incident  techniques;  structured  walkthroughs  or  reviews  of  standards, 
procedures,  or  systems;  surveys  and  questionnaires;  observation;  and  analysis  of  confidential 
reporting  systems.  Once  potential  errors  are  identified,  a  risk  assessment  should  be  done  to 
weigh  the  potential  errors  according  to  their  severity.  The  next  steps  in  effective  error 
management  are  to  identify  the  current  defenses,  evaluate  the  effectiveness  of  the  current 
defenses,  and  identify  additional  defenses  needed.  Finally,  an  effective  mechanism  should  be 
established  for  reporting  potential  errors  and  addressing  them  once  they  are  identified. 
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A  prior  study  on  human  error  in  OCCs  used  a  combination  of  structured  walkthrough  and 
brainstorming  techniques  to  identify  potential  errors  and  propose  mitigation  strategies  for  the 
future  AF  environment  (Ahlstrom  et  al,  1999).  Each  of  the  errors  identified  was  given  a  risk 
estimate  based  on  a  combination  of  the  severity  of  the  impact  on  operations  if  the  error  did  occur 
and  the  probability  or  likelihood  that  it  would  occur.  This  study  provided  insight  on  potential 
errors  and  potential  mitigating  strategies  in  the  future  OCCs. 

The  current  research  study  takes  a  different  approach,  focusing  on  error  that  are  occurring  now 
rather  than  looking  toward  the  future.  The  current  study  analyzes  current  reported  incidents  of 
human  error  with  the  goal  of  finding  ways  to  ensure  that  these  errors  are  not  carried  to  the  future 
AF  environment. 

2.  Method 


Engineering  research  psychologists  from  the  NAS  Human  Factors  Branch  (ACT-530)  of  the 
FAA  William  J.  Hughes  Technical  Center  conducted  this  study.  Initially,  they  researched 
functions  and  tasks  of  the  NAS  Operations  Managers  (NOMs)  and  NAS  Specialists.  They 
examined  job  task  analysis,  FAA  Technical  Notes,  and  other  reports  and  conducted  field  visits  to 
the  NAS  Premiere  Facility  (NPF),  the  NOCC,  the  Prototype  Operations  Control  Center  (POCC), 
and  the  Pacific  Desert  Systems  (PDS)  MCC. 

2.1  Database  Analysis 


The  first  task  in  this  study  was  to  identify  sources  of  human  error.  The  first  step  in  identifying 
human  errors  that  may  affect  the  future  environment  was  to  examine  the  human  errors  that  were 
occurring  in  the  present  environment.  The  researchers  did  this  by  analyzing  data  from  AF 
reports  that  contained  information  on  human  error  in  the  current  AF  environment. 

In  AF,  cause  codes  are  assigned  to  all  service  interruptions  to  provide  data  for  future  analysis. 
The  researchers  obtained  and  analyzed  the  reported  human  errors  that  were  recorded  for  a  1-year 
period  from  July  20,  1998  to  July  20,  1999.  (Note:  They  only  focused  on  the  errors  attributed  to 
AF  personnel,  not  AT  or  contractor-induced  outages.)  They  also  obtained  and  analyzed  the  AF 
delays  report  for  the  1999  fiscal  year  for  additional  information. 

2.2  Field  Study 

When  analyzing  human  error,  researchers  would  like  to  have  quantitative  data  accurately 
recording  actual  errors  committed  by  personnel.  However,  these  data  do  not  exist  in  the  case  of 
human  error  in  MCCs,  so  they  used  structured  interviews  instead.  They  conducted  a  field  study, 
collecting  data  from  structured  interviews  and  observations  at  several  key  field  sites.  The  field 
study  included  the  following  sites:  the  NPF,  the  NOCC,  the  POCC,  and  the  PDS  Management 
Office  and  MCC. 

2.3  Error  Categorization 

Previous  research  (Ahlstrom  et  al.,  1999)  on  human  error  in  OCCs  identified  potential  errors  in 
future  OCC  operations  using  a  combination  of  structured  walkthrough  and  brainstorming 
techniques.  In  that  study,  AF  subject  matter  experts  developed  and  designed  four  scenarios  to 


3 


simulate  events  that  would  take  place  in  future  OCC  environments.  As  the  participants  of  that 
study  stepped  through  the  specially  designed  scenarios,  researchers  asked  them  to  identify 
potential  error  situations  and  propose  possible  strategies  for  preventing  or  mitigating  errors.  The 
errors  were  given  a  risk  estimate  (called  importance  in  the  study)  based  on  the  likelihood  of  the 
error  occurring  and  the  impact  on  operations  if  the  error  did  occur.  Researchers  then  asked  the 
participants  to  sort  the  errors  into  major  categories.  The  errors  with  the  highest  identified  risk 
estimate  fell  into  13  major  categories.  These  categories  formed  the  initial  list  of  potential  errors. 

The  researchers  supplemented  the  error  categories  derived  from  the  Ahlstrom  et  al.  (1999)  study 
with  information  from  the  AF  Job  Task  Analysis  (CTA,  Inc.,  1992)  and  the  Airway  Facilities 
outage  assessment  inventory  (Blanchard,  1994).  The  AF  Job  Task  Analysis  provided 
information  on  the  current  functions  (activities)  and  tasks  in  the  AF  environment,  which  include 
6  high-level  functions,  31  subfunctions,  and  548  tasks.  The  Airway  Facilities  outage  assessment 
inventory  identifies  and  maps  potentially  significant  contributors  to  AF  maintenance  downtime 
within  a  system  structure.  The  outage  assessment  inventory  is  a  form  that  has  1 1  categories  of 
factors,  conditions,  or  events  that  can  be  used  to  describe  a  functional  framework  of  the 
maintenance  process  at  the  General  National  Airspace  System.  It  represents  the  sequential 
progression  of  events,  beginning  at  the  onset  of  a  facility  outage  and  ending  when  the  facility  is 
returned  to  service.  These  1 1  categories  are  further  broken  down  into  subcategories.  The  first 
category  is  called  Outage  Causes,  and  there  are  12  subcategories  under  that  category.  The 
researchers  looked  at  these  12  subcategories  when  trying  to  create  a  taxonomy  that  captured 
outages  induced  by  human  error.  By  examining  previous  research  on  error  mitigation,  the  AF 
Job  Task  Analysis,  and  the  outage  assessment  inventory,  they  derived  13  categories  and  77 
subcategories  of  AF  human  errors  (see  Appendix  A). 

3.  Results 

3.1  Analysis  of  Database  Errors 

In  the  AF  database  that  records  outages,  cause  codes  are  assigned  to  all  service  interruptions  to 
provide  accurate  data  for  future  analysis.  Cause  Code  #89  is  the  code  for  unscheduled  outages  or 
service  interruptions  in  the  “Other”  category,  which  includes  outages  induced  by  AF  personnel. 
Of  the  50  Cause  Code  #89  errors  reported  in  the  ad  hoc  reports  during  the  July  20,  1998  to  July 
20,  1999  period,  35  of  the  incidents  were  attributed  to  AF  personnel.  Researchers  identified  13 
additional  personnel-induced  outages  that  did  not  overlap  with  the  ad  hoc  reports  in  the  AF 
Delays  Report.  They  analyzed  these  48  errors  and  attributed  them  to  nine  major  categories. 

a.  Procedures.  Seventeen  percent  of  the  errors  may  have  occurred  either  because  proper 
procedures  did  not  exist  or  the  specialist  may  not  have  been  aware  of  or  did  not  follow 
the  proper  procedures. 

b.  New  equipment  or  software.  Twelve  percent  of  errors  occurred  in  conjunction  with  new 
equipment  or  software  installations  or  modifications. 

c.  Communication/coordination.  Insufficient  communication  or  coordination  was  blamed 
for  10%  of  the  errors.  Errors  that  occurred  due  to  a  break  in  communication  or 
coordination  tend  to  involve  the  specialist  not  being  aware  of  the  status  of  the  equipment 
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receiving  maintenance.  For  example,  a  specialist  took  equipment  off  line  without 
coordinating  with  AT. 

d.  Labeling.  Ten  percent  of  errors  were  due  to  improperly  or  poorly  labeled  equipment  or 
equipment  that  did  not  have  a  label  but  would  benefit  from  one. 

e.  Equipment  bumps  or  trips.  Six  percent  of  the  errors  involved  switches  that  were 
inadvertently  bumped  or  cables  or  plugs  that  were  disconnected  when  someone  bumped 
into  them  or  tripped  over  them.  These  errors  were  attributed  to  a  lack  of  safety  guards  or 
insufficient  room  to  maneuver  and  usually  were  resolved  by  installing  equipment  guards 
where  necessary.  These  types  of  errors  tend  to  be  commonly  reported  and  easily  fixed. 

f.  Data  entry/keyboard  entry  errors.  Six  percent  of  errors  were  due  to  incorrect  data  entry. 

It  was  not  possible  to  determine  whether  the  inadvertent  keyboard  commands  were  due  to 
specialists  accidentally  hitting  the  wrong  keys  (commonly  called  “fat-fingering”  the 
keyboard)  or  other  reasons,  such  as  the  confusion  of  similar  commands. 

g-  Oversights  (forgetting).  Four  percent  of  the  errors  occurred  when  specialists  forgot  to 
return  a  switch  to  the  correct  position  after  maintenance. 

h.  Incorrect  information.  Two  percent  of  the  errors  were  attributed  to  specialists  using 
incorrect  information  such  as  drawings  or  schematics. 

i.  Other.  This  category  doesn’t  seem  very  descriptive,  but  33%  of  the  incident  descriptions 
in  the  report  did  not  contain  sufficient  information  to  properly  categorize  the  data  (e.g., 
“frequency  interruption  by  FAA  personnel  prevented  landings”).  This  description  does 
not  give  enough  information  about  the  source  of  the  error  to  be  useful.  There  could  be 
numerous  causes  for  such  an  error.  Was  this  error  a  violation  of  proper  procedures?  If 
so,  this  kind  of  violation  could  have  been  caused  by  a  lack  of  experience  or  training, 
excessive  workload,  fatigue,  or  poor  equipment  design.  Table  1  lists  some  examples  of 
error  descriptions  with  insufficient  detail. 

Table  1.  Error  Descriptions  With  Insufficient  Detail  to  Categorize 


Frequency  interruption  by  FAA  personnel  prevented  LDA  landings. 

Power  interruption  occurred  during  installation  of  a  new  power  panel. 

The  computer  operator  inadvertently  reloaded  the  program  on  the  on-line  processor  causing  an  interruption. 
The  reload  was  performed  following  a  switchover  due  to  buffer  overload  caused  by  a  printer  failure. 

Specialists  disconnected  beacon  control  unit  cable  while  troubleshooting  system. 

Specialists  powered  down  Cabinet  4  and  entire  system  failed.  Ops  transitioned  to  ADW  sensor. 

Radar  data  failed. 


We  have  plotted  incidents  by  month  and  time  of  day  in  Figure  1 .  This  figure  implies  that  most 
errors  occur  during  the  evening  hours  with  the  highest  proportion  occurring  between  8:00  PM 
and  midnight  followed  by  the  4:00  PM  to  8:00  PM  time  slot.  These  data  seem  to  imply  a 
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connection  between  time  of  day  and  errors.  However,  these  conclusions  are  premature  based  on 
the  limited  available  data.  A  possible  explanation  is  that  managers  are  scheduling  riskier  work 
during  times  when  the  impact  to  AT  operations  may  be  minimized.  There  are  many  other 
possible  explanations  as  well,  and  to  decide  on  a  particular  course  of  action  would  require  more 
extensive  research,  data,  and  analysis.  The  point  is  that,  with  accurate  data  on  what  errors  occur 
and  when  they  occur,  preventative  actions  may  be  instituted  to  minimize  such  errors. 
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Figure  1 .  Incidents  Plotted  by  Month  and  Time  of  Day. 


Almost  all  of  the  errors  reported  in  this  database  were  attributed  to  field  technicians.  However, 
there  appears  to  be  little  record  of  MCC  errors.  Lack  of  documented  MCC  errors  does  not  mean 
that  MCC  specialists  never  commit  errors.  AF  specialists  from  MCCs  indicated  that  MCC  errors 
might  be  less  likely  to  be  directly  linked  to  an  outage.  According  to  the  AF  specialists,  some  of 
the  errors  that  MCCs  might  make  include  calling  a  field  technician  who  was  not  available  or 
sending  field  technicians  to  the  wrong  site.  Thus,  to  supplement  the  data  from  the  AF  databases, 
they  conducted  structured  interviews  with  AF  specialists  in  the  field,  particularly  focusing  on 
those  who  work  in  MCCs. 

3.2  Field  Study 

Field  interviews  with  AF  specialists  revealed  some  common  types  of  human  error  that  occur  in 
MCCs  (Appendix  B).  The  predominating  causes  of  current  human  error  are  described  in  the 
following  list. 

a.  Communication/coordination  errors  -  During  field  interviews,  specialists  rated 

communication  errors  as  the  principal  current  and  potential  source  of  errors.  The  AF  Job 
Task  Analysis  (CTA,  Inc.,  1992)  lists  548  tasks  for  which  the  NOMs  and  NAS  specialist 
are  responsible  in  the  current  work  environment.  A  large  number  of  these  tasks,  245,  are 
voice-communication  tasks.  This  means  that  the  majority  of  the  specialists’  time  is  spent 
on  transfer  or  exchange  of  information  with  another  person  via  a  telephone  or  face-to- 
face.  Communication  problems  may  arise  due  to  failures  in  communication  among  OCC 
team  members  or  between  OCC  team  members  and  others  (e.g.,  terminology  differences 
between  AF  and  AT).  A  recent  example  of  an  outage  caused  by  a  communication  error 
was  when  the  terminal  radar  service  was  lost  when  transferring  from  engine  generator  to 
commercial  power  without  coordinating  with  the  FAA  Terminal  Radar  Approach 
Control. 
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b.  Errors  due  to  incomplete  or  incorrect  information  -  Specialists  reported  that  status 
information  and  information  in  other  databases  are  not  always  maintained  and  up-to-date. 
This  can  cause  errors  such  as  calling  a  field  technician  who  is  unavailable  to  fix  a 
problem  thus  increasing  outage  durations.  They  also  indicated  that  weather  plays  a 
critical  factor  in  AF  decision-making.  However,  observation  and  structured  interviews 
revealed  that  specialists  often  do  not  have  current  weather  information  for  their  area. 

c.  Critical  facility  errors  -  Critical  facility  errors  result  from  not  being  aware  of  the  impact 
of  events  and  resolution  on  other  facilities.  AF  services,  facilities,  and  equipment  have 
differing  levels  of  criticality  under  different  circumstances  based  on  the  current  status  of 
other  NAS  elements.  An  example  of  a  critical  facility  error  would  be  taking  a  facility 
offline  for  maintenance  when  it  is  required  for  backup  purposes. 

d.  Shift  work  errors  -  Although  there  is  a  vast  amount  of  literature  on  shift  work,  to  date, 
the  contribution  shift  work  in  AF  may  make  to  human  error  is  unknown.  Shift  work  is 
not  an  issue  for  all  MCCs  in  that  many  are  only  open  during  regular  working  hours 
(however,  OCCs  are  intended  to  be  open  24  hours  a  day,  7  days  a  week).  Accurate  data 
on  when  errors  are  occurring  can  give  insight  on  the  contribution  shift  work  may  have  to 
these  errors. 

e.  Workload  errors  -  Work  in  the  MCCs  tends  to  come  in  waves;  that  is,  many  events  will 
occur  within  a  short  time,  causing  a  very  high  workload  followed  by  a  period  of  lower 
workload.  This  phenomenon  is  documented  in  the  workload  analysis  study  (AFHF 
Research,  Engineering,  and  Development,  1997).  Specialists  reported  that  during  high- 
workload  periods,  it  is  easy  for  the  specialist  to  get  interrupted  while  performing  an 
action  and  consequently  forget  to  complete  the  action. 

These  errors  overlapped  with  the  potential  errors  anticipated  by  specialists  in  conjunction  with 
changing  from  an  MCC  to  an  OCC  environment.  They  expressed  concern  that  the  changes  from 
an  MCC  working  environment  to  an  OCC  working  environment  (and  the  associated  increase  in 
geographic  area  of  responsibility)  would  cause  increases  in  these  types  of  errors  and  introduce 
new  errors  associated  with  learning  new  business  practices. 

According  to  the  specialists  interviewed,  the  following  categories  of  errors  have  the  potential  of 
increasing  with  the  introduction  of  OCCs: 

a.  Procedures/business  practice  errors  -  Occasionally,  errors  occur  because  procedures  are 
unclear  or  are  not  followed.  This  may  be  due  to  lack  of  training  on  the  part  of  the 
specialist  or  memory  overload.  These  errors  occur  in  the  present  MCCs,  but  there  is  also 
the  potential  for  increased  human  error  of  this  type  with  the  introduction  of  new 
procedures  and  business  practices  associated  with  the  OCCs.  As  one  specialist  said,  “In 
the  MCCs,  everyone  is  a  generalist.  In  the  OCC,  the  need  to  communicate  and 
collaborate  between  specialty  positions  is  especially  important  and  could  potentially  be 
problematic.” 

b.  Remote  maintenance  monitoring  (RMM)  errors  -  An  increasing  number  of  AF 
communication,  navigation  and  surveillance  facilities  including  both  hardware  and 
software  are  being  remotely  monitored.  RMM  interfaces  for  different  facilities  are  not 
always  consistent  with  one  another  or  well  integrated  into  the  current  system. 
Furthermore,  some  MCC  specialists  were  not  familiar  with  using  RMM  to  do  remote 
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certification.  Interface  design  and  integration  should  be  examined  for  usability,  and 
specialists  should  be  trained  on  the  use  of  RMM. 

c.  Insufficient  training/insufficient  experience  errors  -  The  AF  workforce  is  aging  and  new 
specialists  are  replacing  those  with  years  of  experience.  Many  of  the  specialists  that  will 
work  in  the  OCCs  may  come  from  the  field  and  may  lack  experience  in  a  monitor  and 
control  environment.  By  consolidating  operations,  the  OCCs  will  risk  losing  area- 
specific  knowledge  that  the  specialists  at  the  MCCs  have  gained  over  the  years.  Some 
examples  given  were  a  VORTAC  that  was  only  accessible  by  boat  and  the  Alaskan 
Moose  that  had  a  tendency  to  attack  equipment  during  mating  season. 

3.3  Current  Error  Mitigation  Strategies 

The  general  strategy  for  error  mitigation  is  to  limit  the  occurrence  of  errors  and  to  limit  the 
consequences  of  errors  if  they  do  occur.  Before  any  corrections  are  introduced  to  minimize 
deleterious  effects  or  prevent  errors  from  occurring,  the  error-prone  areas  must  first  be  identified. 
Once  these  are  defined,  appropriate  mitigation  strategies  can  be  developed.  For  example,  a 
nuclear  power  industry  analysis  of  errors  found  that  reassembly  is  more  error  prone  than 
disassembly  (Maddox,  1998).  The  key  is  to  know  where  the  errors  are  likely  to  occur  and  draw 
attention  to  this  (wrong  reassembly  may  not  be  obvious  on  later  inspection). 

Examination  of  the  lessons-learned  database,  which  addresses  the  human-error  incidents 
recorded  in  the  AF  databases,  found  that  the  primary  solution  was  to  counsel  the  person  who 
committed  the  error,  to  install  equipment  guards  where  necessary,  and  to  create  new  procedures 
to  avoid  the  situation. 

The  management  and  staff  at  the  POCC  were  aware  of  the  probability  of  many  of  the  errors 
revealed  by  the  field  interviews  and  were  working  with  the  NAS  Infrastructure  Management 
Project  Office  (AOP-30)  to  develop  mitigation  strategies. 

4.  Conclusions 


How  do  we  ensure  that  we  learn  from  our  mistakes  instead  of  being  doomed  to  repeat  them? 
First,  the  data  analyzed  in  this  study  show  that  there  is  a  need  for  more  accurate  error-tracking 
systems  in  AF.  Errors  cannot  be  investigated  and  analyzed  until  they  are  identified.  In  the 
majority  of  the  cases  reported,  there  was  insufficient  information  to  remedy  the  error-inducing 
situations  in  which  the  AF  specialists  find  themselves.  AF  maintains  a  lessons-learned  database 
that,  among  other  things,  contains  information  on  outages  caused  by  human  error  and  the  steps 
taken  to  prevent  future  errors.  Presently,  the  outage  reporting  system  (of  which  the  ad  hoc  and 
lessons-learned  reports  are  a  part)  is  the  method  of  tracking  human  error  in  the  AF  environment. 
However,  this  system  was  not  put  into  place  to  examine  human  error  but  rather  to  track 
equipment  performance.  There  is  a  definite  need  to  develop  a  new  method  of  tracking  AF 
human  errors.  However,  the  purpose  must  be  to  document  the  errors,  investigate  them,  and  come 
up  with  solutions,  not  to  place  blame. 

Establishing  an  error-reporting  system  has  many  advantages.  It  can  create  a  framework  for 
critically  evaluating  and  continually  improving  the  integrity  of  AF  and  allow  for  the 
identification  of  error-prone  tasks  including  the  identification  of  which  tasks  might  be 
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automated.  Error-reporting  systems  have  also  been  used  as  a  formal  communication  channel  for 
identifying  weaknesses  in  procedures  and  equipment  design,  allowing  for  preventative  corrective 
action  before  the  problem  becomes  an  outage.  This  approach  allows  AF  to  take  a  proactive 
rather  than  reactive  approach  to  errors,  a  positive  step  toward  meeting  their  goals  of  minimizing 
the  occurrence  and  duration  of  outages. 

There  are  also  many  restraining  forces  to  the  establishment  of  an  error-reporting  system.  In 
general,  people  tend  to  resist  change  and  may  be  concerned  about  the  risk  for  misuse  of  such  a 
system.  It  is  important  when  establishing  such  a  system  that  the  focus  is  on  the  system  or 
situation  instead  of  individual  blame.  In  order  to  identify  and  mitigate  error-causing  situations, 
AF  needs  a  system  that  encourages  accurate  reporting  and  provides  protection  for  the  respondent. 
Such  a  system  requires  a  commitment  on  the  part  of  the  FAA,  setting  aside  resources  at  a  time 
when  the  FAA  is  already  faced  with  tight  budget. 

In  spite  of  restraining  forces,  anonymous  reporting  systems  have  been  used  effectively  in  many 
other  areas,  particularly  for  reporting  personnel  errors  or  hazards  that  may  lead  to  safety 
violations.  Some  examples  of  such  systems  are  the  Aviation  Safety  Reporting  System, 
Maintenance  Error  Decision  Aid,  and  Managing  Engineering  Safety  Health.  Recently,  the  FAA 
announced  a  new  initiative  called  the  Aviation  Safety  Action  Program.  This  program  is  intended 
to  address  safety  issues,  but  a  similar  system  could  be  used  to  examine  human  error  in  AF. 

Second,  new  systems  and  communication/coordination  are  leading  categories  of  AF  error  both  in 
the  interviews  and  in  the  analysis  of  errors  from  the  database.  Communication  and  coordination 
errors  were  also  the  top  potential  errors  identified  in  a  previous  study  (Ahlstrom  et  al.,  1999). 
Coordination  points  and  channels  of  communication  should  be  clearly  defined  and  may  need  to 
be  included  on  checklists  or  other  mnemonic  aids.  Communication  about  the  equipment  status  is 
of  particular  concern  and  can  lead  to  critical  facility  errors.  Redundant  channels  of 
communication  should  be  identified  and  eliminated,  and  research  should  be  conducted  on  ways 
to  enhance  the  communication  and  coordination  process,  particularly  in  relation  to  facility  status. 
Special  attention  should  be  paid  to  the  effect  that  new  systems  or  equipment  will  have  on  the 
existing  system.  Any  new  system  or  equipment  should  be  evaluated  for  its  potential  impact  on 
existing  systems  before  installation.  Specialists  should  have  clear  instructions  on  how  to  install 
new  systems  without  compromising  existing  systems. 

Third,  there  are  a  high  percentage  of  procedural  errors  identified  in  the  analysis  of  the  database 
as  an  area  of  potential  concern  for  the  future.  It  is  essential  that  the  future  AF  environment  pay 
attention  to  developing  clear  and  effective  procedures  and  ensuring  that  AF  specialists  are 
adequately  trained  on  the  procedures.  Future  research  is  warranted  in  this  area  to  identify 
procedures  that  may  benefit  from  checklists  or  other  user  aids. 

Finally,  one  of  the  concerns  expressed  in  the  structured  interviews  is  the  contribution  of  shift 
work  and  shift  work-related  fatigue  to  the  commission  of  errors.  Although  there  is  a  wealth  of 
literature  on  the  effects  of  shift  work  and  fatigue  and  AF  has  worked  shifts  for  many  years,  there 
is  no  current  research  on  the  effects  of  shift  work  and  fatigue  on  AF  errors. 
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ACRONYMS 


AF 

Airway  Facilities 

AT 

Air  Traffic 

FAA 

Federal  Aviation  Administration 

MCC 

Maintenance  Control  Center 

NAS' 

National  Airspace  System 

NMCC 

National  Maintenance  Control  Center 

NOCC 

National  Operations  Control  Center 

NOM 

NAS  Operations  Manager 

NPF 

NAS  Premier  Facility 

occ 

Operations  Control  Center 

PDS 

Pacific  Desert  System  Management  Office 

POCC 

Prototype  Operations  Control  Center 

RMM 

Remote  Maintenance  Monitoring 
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Appendix  A 

AFCC  Potential  Human  Error  Categories  and  Subcategories 

1 .  Communication  Errors 

1.1.  Failure  of  Air  Traffic  (AT)  and  Airway  Facilities  (AF)  to  communicate  effectively 

1.2.  Failure  to  acknowledge  and  coordinate  information 

1 .3.  Failure  to  report  aircraft  accident/  incident  in  a  timely  manner  to  NAS  Operations  Control  Center  (NOCC) 

1.4.  Misunderstandings  between  Operations  Control  Center  (OCC)  Specialists  and  System  Management  Office 
(SMO) 

1 .5.  Interruption  problems  due  to  communicating  over  remote  link  versus  face-to-face 

1.6.  OCC  specialists  and  field  specialists  terminology  differences 

2.  Errors  Due  to  Incomplete  or  Incorrect  Information 

2.1 .  Status  board  information  incorrect,  insufficient,  or  misleading 

2.2.  Provides  incorrect  status  information  to  others 

2.3.  Equipment  certification  not  always  known 

2.4.  Inaccurate  status 

2.5.  Lack  of  access  to  real  time  data 

2.6.  Not  able  to  locate  correct  contact  information 

2.7.  Inability  to  verify  if  backup  is  in  service 

2.8.  Database  Errors 

2.8.1.  Contact  database  out  of  date 

2.8.2.  Directions  to  site  needing  service  not  available 

2.8.3.  Database  is  incomplete,  not  all  needed  data  has  been  entered 

2.9.  Fault  history  information  not  available  to  specialist 

3.  Critical  Facility  Errors 

3.1 .  Difficulty  in  tracking  the  role  of  each  facility  under  different  operating  conditions 

4.  Event  Ticket  Errors 

4.1.  Lack  of  responsibility  for  an  event  ticket 

4.2.  Failure  to  update  event  ticket 

4.3.  Incorrect  event  ticket  closeout  (i.e.,  ticket  closed  when  it  should  remain  open,  or  open  when  it  should  be 
closed) 

4.4.  Data  entry  errors 

4.5.  Multiple  event  tickets  are  open  for  a  single  event 

4.6.  Failure  to  open  event  tickets  in  a  timely  manner 

4.7.  Delays  caused  by  failure  of  retrieving  the  wrong  event  ticket 
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4.8.  OCC  specialist  over  reliance  on  information  on  event  tickets 

4.9.  Incomplete  event  tickets 

4.10.  Event  ticket  procedures  not  standardized 

4.1 1.  Incomplete  problem  description  on  ticket 

4.12.  Event  tickets  not  opened  due  to  many  events  occurring  at  the  same  time 

4.13.  Use  of  confusing  acronyms 

4.14.  Wrong  priority  for  event  ticket 

4.15.  OCC  specialist  does  not  understand  event  ticketing  system 

4. 16.  Event  ticketing  system  fails;  specialist  not  familiar  with  backup  plan 

5.  Procedural  Errors 

5.1.  Unclear  backup  plans 

5.2.  Procedure  documents  in  use  may  not  be  the  latest  versions,  or  long  overdue  for  updates 

5.3.  Non-standard  procedures,  varied  from  region  to  region 

5.4.  Specialist  on  call  may  not  know  specific  procedures  to  solve  a  problem,  which  may  indicate  the  wrong 
specialist  was  sent  to  the  site 

5.5.  Fails  to  prioritize;  resulting  in  more  critical  work  being  delayed 

5.6.  Fails  to  switch  to  backup  system  in  a  timely  manner 

5.7.  Fails  to  attempt  system  reset  in  a  timely  manner 

5.8.  Loses  situational  awareness  of  status  of  technician  in  travel,  or  at  site 

6.  Certification  error 

6.1.  Failure  to  certify  systems  in  a  timely  manner  (allowing  system  to  exceed  its  maximum  certification 
interval) 

6.2.  Improper  certification  (certifying  an  uncertifiable  system)  i.e.,  certifying  the  system  when  a  component 
facility  is  not  certified 

6.3.  Loses  situational  awareness  of  status  of  leased  services  and  the  ongoing  activities  of  the  providers  (leads  to 
no  follow-up) 

6.4.  Loses  situational  awareness  on  the  status  of  NOTAM  (Fails  to  assure  cancellation  NOT  AM  after  a  facility 
is  returned  to  service) 

6.5.  Errors  related  to  scheduled  outages 

6.6.  Fails  to  follow-up  on  field  specialist  request  for  a  scheduled  outage.  May  cause  an  unscheduled  outage 
due  to  specialist  not  being  able  to  replace  a  failing  component 

6.7.  Fails  to  inform  technician  that  approval  for  a  scheduled  outage  is  withdrawn.  Technician  may  assume  is 
“ok”  to  remove  facility  from  service. 

6.8.  Flight  check  coordination  errors 

6.9.  Fails  to  advise  field  specialist  of  change  in  flight  check  schedule 

6. 10.  Fails  to  inform  AT  of  impending  flight  check.  Results  in  significant  impact  to  service  provided  by  AT  or 
canceling  the  scheduled  flight  check. 

7.  Remote  Maintenance  Subsystem  (RMS)  Errors 

7.1.  Unfamiliar  with  Remote  Maintenance  Monitoring  (RMM)  capabilities 

7.2.  Unable  to  update  RMM  parameters 
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8. 


9. 

10. 

11. 


12. 

13. 


Insufficient  Training/Insufficient  Experience  Errors 

8.1.  Inaccurate  diagnosis  of  problem 

8.2.  Poor  troubleshooting  methods 

8.3.  Unfamiliar  with  interaction  protocol  with  others 

8.4.  Specialist  does  not  hear  alarm 

8.5.  Not  comfortable  with  assuming  new  AFCC  functions 

8.6.  OCC  Specialist  not  familiar  with  OCC’s  boundaries/  domain 

8.7.  Specialist  not  comfortable  with  MMS  procedures 

8.8.  Specialist  not  familiar  with  non-FAA  organizations  that  could  impact  National  Airspace  System  (NAS) 
infrastructure  operations  (e.g.,  Department  of  Defense  (DOD)  and  air  shows) 

8.9.  Forgets  to  update  event  reports  as  needed  (i.e.,  outages,  accidents,  incidents,  etc.) 

8. 10.  Unfamiliar  with  reporting  aircraft  accident/  incident 

8.11.  Unable  to  locate  the  Logical  Unit  Identification  (LUID)  screen  in  alarm/  alert 

8.12.  Fails  to  recognize  faults  or  degradation  of  services 

8.13.1s  not  situationally  aware  and  makes  errors  in  setting  restoration  priority 

8.14.  Does  not  recognize  need  for  preemptive  action  such  as  starting  an  engine  generator  prior  to  arrival  of 
severe  weather. 

8.15.  Taking  a  preemptive  action  without  knowing  or  realizing  the  current  status.  For  example,  trying  to  start 
the  engine  generator  while  an  environmental  specialist  is  working  on  the  engine  generator. 

Errors  due  to  lack  of  documentation 

9.1.  May  impact  trend  analysis  (i.e.,  incomplete  record  of  past  failure  may  indicate  erroneous  trends) 

9.2.  Scheduling  work  that  has  already  been  completed 
Hazardous  Materials  Errors 

10.1 .  Failure  to  recognize  hazardous  materials  (PCBs,  battery  acid,  transformer  oil) 

Staffing  Induced  Errors 

1 1.1.  Multiple  failures  in  a  geographic  or  specialty  could  require  a  range  of  knowledge  and  skills  beyond  that 
possessed  by  one  specialist 

1 1 .2.  Heavy  workload 

1 1 .3.  Manpower  in  times  of  crises 

1 1 .4.  Shift  work  errors 

Lack  of  room  to  maneuver-bumps  and  trips 
Labeling  errors 
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Appendix  B 

POCC  ERROR  MITIGATION  QUESTIONS 

1 .  What  do  you  see  as  major  changes  from  the  MCC  working  environment  to  the  OCC 
environment  that  may  cause  human  errors? 

2.  What  do  you  anticipate  to  be  the  major  causes  of  human  errors  in  the  OCC  environment? 

3.  According  to  the  AF  Job  Task  Analysis  (1992),  nearly  50%  of  the  Specialists  tasks  were 
communication  tasks.  A  recent  study  indicated  that  poor  communications  was  the  major 
cause  of  human  errors.  Do  you  see  this  as  a  problem  within  the  OCCs? 

4.  What  are  other  potential  causes  of  human  error  in  AF? 

5.  Do  you  have  any  suggestions  on  how  to  reduce  voice  communications  in  the  OCCs? 

6.  Would  you  like  to  see  an  Artificial  Intelligence  system  added  to  the  OCCs? 

7.  Is  a  record  added  to  a  database  for  incident  tracking? 

8.  How  accurate  do  you  think  the  “status”  information  will  be  in  the  OCCs? 
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